Optimising text quality in generation from relational databases

نویسندگان

  • Michael O'Donnell
  • Alistair Knott
  • Jon Oberlander
  • Chris Mellish
چکیده

This paper outlines a text generation system suited to a large class of information sources, relational databases. We focus on one aspect of the problem: the additional information which needs to be specified to produce reasonable text quality when generating from relational databases. We outline how databases need to be prepared, and then describe various types of domain semantics which can be used to improve text qualify. 1 I n t r o d u c t i o n As the problems of how we generate text are gradually solved, a new problem is gaining prominence where do we obtain the information which feeds the generation. Many domain models for existing generation systems are hand-crafted for the specific system. Other systems take advantage of existing information sources. A good information source for text generation resides in the vast number of relational databases which are in use around tile world. These resources have usually been provided for some reason other than text generation, such as inventory management, accounting, etc. However, given that the information is on hand, it can be of value to conuect these databases to text generation facilities. The benefits include natural language access to information which is usually accessed in tabular form, which can be difficult to interpret. Natural Language descriptions are easier to read, can be tailored to user types, and can be expressed in different languages if properly represented. This paper outlines the domain specification language for the ILEX text g~neration system, (for Intelligent Labelling Explorer). 1 ILEX is a tool for •dynamic browsing of databasedefined information: it allows a user to browse through the information in a database using hyper1Earlier ILEX papers have been based on Ilex 2.0, which was relatively domain-dependen t . Th i s paper is based around version 3.0 of ILEX, a re-draf t to make the sy s t em domainindependent , and domain acquisi t ion far easier. The ILEX project was suppor ted by E P S R C grant GR/K53321 . text. ILEX generates descriptions of database objects on the fly, taking into account the user's context of browsing. Figure 1 shows the ILEX web interface, as applied to a museum domain, in this case the Twentieth Century Jewellery exhibition at the the National Museum of Scotland. 2 The links to related database objects are also automatically generated. ILEX has been applied to other domains, including personnel (Nowson, 1999), and a sales catalogue for computer systems and peripherals (Anderson and Bradshaw, 1998). One of the advantages of using NLG for da tabase browsing is that the system can keep track of what has already been said about objects, and not repeat that information on later pages. Appropriate referring expressions can also be selected on the basis of the discourse history. The object descriptions can be tailored to the informational interests of the user. See Knot t et al. (1997) and Mellish et al. (1998) for more information on these aspects of ILEX. In section 2, we consider some systems related to the ILEX system. Section 3 describes the form of relational database that ILEX accepts as input. Section 4 outlines what additional information domain semantics needs to be provided for coherent text production from the database, while section 5 describes additional information which can be provided to improve the quality of the text produced. 2 R e l a t e d W o r k It should be clear that the task we are discussing is very distinct from the task of response generation in a natural language interface to a database (e.g., see Androutsopoulos et al. (1995)). ' In such systtems, the role of text planning is quite simple or absent, usually dealing with single sentences, or in the most • • complex systems;~ a:single:sentence ,answer ~with an additional clause or two of supporting information. ILEX is not a query response generation system, it is an object description system. It composes a full text, at whatever size, with the goal of making that text a coherent discourse. 2The au tho r s t h a n k the m u s e u m for making their d a t a b a s e available:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural Language Access to Relational Databases through STEP

This paper introduces the STEP system for natural language access to relational databases. In STEP the administrator couples phrasal patterns to elementary expressions within a decidable fragment of tuple relational calculus. This phrasal lexicon serves as a bi-directional grammar, enabling the generation of natural language from tuple relational calculus and the inverse parsing of natural lang...

متن کامل

A Phrasal Approach to Natural Language Interfaces over Databases

This report introduces the STEP system for natural language access to relational databases. In contrast to most work in the area, STEP adopts a phrasal approach; an administrator couples phrasal patterns to elementary expressions of tuple relational calculus. This ‘phrasal lexicon’ is used bi-directionally, enabling the generation of natural language from tuple relational calculus and the inver...

متن کامل

Clinical records anonymisation and text extraction (CRATE): an open-source software system

BACKGROUND Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. RESULTS This article ...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

A STEP Towards Realizing Codd's Vision of Rendezvous with the Casual User

This demonstration showcases the STEP system for natural language access to relational databases. In STEP an administrator authors a highly structured semantic grammar through coupling phrasal patterns to elementary expressions within a decidable fragment of tuple relational calculus. The resulting phrasal lexicon serves as a bi-directional grammar, enabling the generation of natural language f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000